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filled  summer. 


Guest  Column:  Small  Depth  Quantum  Circuits1 

Debajyoti  Bera,2  Frederic  Green,2,  Steven  Homer 2 

Abstract 

Small  depth  quantum  circuits  have  proved  to  be  unexpectedly  powerful  in  comparison  to 
their  classical  counterparts.  We  survey  some  of  the  recent  work  on  this  and  present  some  open 
problems. 


1  Introduction  and  Motivation 

Quantum  circuits  are  the  most  natural  and  general  formulation  of  quantum  computation.  They  are 
general  in  the  sense  that  they  are  a  universal  model;  any  quantum  computation  can  be  efficiently 
simulated  by  a  quantum  circuit  [27,  19].  They  are  natural  in  that  a  quantum  computation  (or 
quantum  algorithm)  can  best  be  understood  as  a  collection  of  qubits  being  acted  on  by  quantum 
operators  represented  as  (defined  by)  a  tensor  product  of  quantum  gates.  Although  the  quantum 
circuit  model  is  quite  different  than  the  classical  one,  it  has  nevertheless  proven  to  be  quite  fruitful 
to  look  to  classical  circuit  models  for  insight.  Classical  circuits  of  small  depth4  (e.g.,  polylog  as  in 
the  class  NC)  have  been  proposed  as  realistic  models  of  parallel  computation.  Furthermore,  very 
small  (i.e.,  constant)  depth  classical  circuits  present  us  with  computational  models  for  which  we 
can  actually  prove  interesting  lower  bounds.  What  happens  when  we  extend  these  concepts  to 
quantum  circuits? 

The  answer  is  a  bit  surprising,  and  leads  fairly  quickly  to  interesting  variants  of  the  funda¬ 
mental  problems  of  quantum  computing.  For  example,  consider  the  class  AC0  of  constant-depth, 
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2Boston  University  {(dbera|homer)}@cs .bu. edu. 
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4See  Vollmer  [26]  for  the  basic  definitions  and  facts  about  classical  circuit  classes. 
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polynomial-size  circuits  consisting  of  NOT  together  with  unbounded  fan-in  and  fan-out  AND  and 
OR  gates.  Some  interesting  functions  can  be  computed  in  this  class  (addition  of  n-bit  numbers  for 
example),  and  some  cannot  (e.g.,  parity  [11]).  It  would  be  interesting  to  see  what  an  analogous 
quantum  complexity  class  would  look  like.  But  if  we  try  to  translate  the  class  AC0  into  the  quantum 
setting,  we  are  faced  with  an  immediate  problem.  The  unbounded-fanin  “quantum  AND  gate,”  the 
generalized  Toffoli  gate  (which  also  encompasses  negation  and  hence  OR  as  well)  does  not  allow  for 
fanout  in  any  way  we  may  take  for  granted.  The  reason  is  the  “no-cloning  theorem,”  which  says 
that  it  is  in  general  not  possible  to  make  a  copy  of  a  quantum  state.  One  can,  however,  make  copies 
of  classical  bits,  which  suggests  the  idea  of  introducing  a  unitary  operator  to  implement  fanout  of 
classical  bits.  (A  similar  operation  is  really  implicit  in  the  AC0  model;  we  solder  multiple  outgoing 
wires  to  an  AND  gate,  obtaining  copies  of  the  output.  While  it  is  not  entirely  realistic  to  have  an 
unbounded  number  of  wires  fanning  out  of  the  gate  even  classically,  this  is  a  useful  abstraction  out 
of  which  the  definition  of  AC0  arises.) 

We  therefore  seem  to  have  two  candidates  for  quantum  analogs  of  AC0.  One,  which  just  includes 
generalized  Toffoli  gates  and  single-qubit  gates,  is  called  QAC°.  The  other,  which  includes  fanout 
gates,  is  called  QAC^f.  The  latter  appears  to  be  the  more  realistic  version,  since  it  is  straightforward 
to  see  that  it  includes  AC0.  However,  we  will  see  in  this  article  that  QAC^,f  is  much  more  powerful 
than  its  classical  counterpart.  There  nevertheless  remains  much  that  we  do  not  understand.  For 
example,  the  classical  version  of  QAC°  (in  which  AND  and  OR  gates  have  at  most  constant  fanout) 
is  provably  weaker  than  AC0.  In  stark  contrast,  it  is  unknown  to  this  day  if  QAC°  is  the  same  as 
QAC°f. 

This  gives  an  interesting  theoretical  framework  in  which  quantum  analogs  of  classical  circuit 
models  are  provably  more  powerful.  But  does  it  have  bearing  on  reality?  It  might.  Realizable 
quantum  computations  will  have  very  limited  duration,  due  to  short  coherence  times,  which  suggests 
that  highly  parallelized  quantum  circuits  (such  as  those  we  will  see  can  be  obtained  using  fanout) 
are  desirable.  Fanout  gates  as  well  as  Toffoli  gates  might  actually  be  feasible  to  build  via  ion  trap  [3] 
or  bulk  NMR  techniques  [12].  It  may  therefore  be  of  more  than  mere  theoretical  interest  to  explore 
their  power,  both  intrinsic  and  relative  to  each  other. 

2  Circuit  Elements  and  Classes 

In  this  brief  survey,  we  will  be  unable  to  provide  a  review  of  quantum  computation.  For  a  quick 
introduction,  we  recommend  Fenner  [7]  or  Fortnow  and  Rogers  [10];  for  an  in-depth  treatment,  see 
Nielson  and  Chuang  [17]. 

To  make  the  treatment  as  self-contained  as  possible,  we  introduce  the  following  notation.  Let 
TL  denote  the  2-dinrensional  Hilbert  space  spanned  by  the  computational  basis  states  1 0) ,  |1).  Let 
TL  i , . . .  ,TLn  be  n  copies  of  TL.  By  Bn  we  denote  the  2r!-dimensional  Hilbert  space  TL\  <8>  •  •  •  <8>  TLn 
spanned  by  the  usual  set  of  computational  basis  states  of  the  form  \x\, . . .  ,xn),  where  each  xi  £ 
{0, 1}.  A  “state  over  a  set  of  n  bits”  is  a  state  in  Bn.  Let  Un  denote  the  set  of  unitary  matrices 
that  act  on  states  in  Bn  (Un  is  just  a  convenient  notation  for  the  group  U(2n)).  A  quantum  gate  G 
corresponds  to  an  element  of  Un ,  which  is  also  denoted  G.  Thus,  for  example,  a  single-qubit  gate  is 
an  element  of  U\,  acting  on  states  in  B\. 

We  now  exhibit  the  main  quantum  gates  that  we  will  consider.  The  single-qubit  Hadamard  gate 


is  defined  by, 


1  1 
1  -1 


Let  /  :  {0, 1}" 


{0, 1}  be  a  Boolean  function  of  n  inputs.  Many  of  our  gates  take  the  form, 


G\x1,...,xn,b)  =  \xi,...,xn,b®  f(xi,...,xn)) 


This  is  just  an  easy  form  of  simulating  any  classical  function  in  a  reversible  manner.  If 
f(x i,...,xn)  =  A”=  j X{ ,  G  is  called  a  generalized  Toffoli  gate,  or  (in  this  paper)  simply  a  Tof- 

n 

foli  gate,  and  is  written  as  T.  If  f(x i, . . .  ,xn)  =  ©  Xj,  G  is  a  parity  gate,  written  P.  Generalizing 

1=1 

this,  define  the  classical  Boolean  function  Modg  :  {0,  l}n  — >  {0, 1}  so  that  Modg(xi, . . .  ,xn)  =  1 
iff  Ya= i  xi  ^  0  (mod  q).  If  /  =  Modg,  we  call  G  a  MODg  gate.  Next,  for  any  t,  define  the  boolean 
function  s(xi,  ...,xn)  =  1  iff  Ya=  i  xi  >  t.  If  /  =  s,  we  call  G  a  threshold  gate  and  write  it  as  S. 
Finally,  the  fanout  gate  F  is  defined  by, 

F\xi,...,xn,b)  =  \b®xi,...,b®xn,b). 

It  is  known  that  the  T-gate  for  n  =  1  (known  as  controlled-not,  or  “CNOT”)  together  with 
single-qubit  gates  (in  particular,  the  Hadamard,  phase,  and  7r/8  gates)  are  a  universal  set  of  gates 
in  that  any  unitary  operator  can  be  approximated  to  an  arbitrary  degree  of  precision  with  them. 

A  quantum  circuit  is  constructed  out  of  layers.  Each  layer  L  is  a  tensor  product  of  a  certain 
fixed  set  of  gates.  A  circuit  is  simply  a  (matrix)  product  of  layers  L1L2  ■  •  •  L (Observe  that 
the  “last”  layer  L d  is  actually  the  one  that  is  applied  directly  to  the  inputs,  and  L\  is  the  output 
layer.)  The  number  of  layers  d  is  called  the  depth  of  C.  A  circuit  C  over  n  qubits  is  then  a  unitary 
operator  in  Un.  Clearly,  C  computes  a  unitary  operator  U  exactly  if  for  all  computational  basis 
states,  C\x\, ...,  xn)  =  U\x\, ...,  xn).  This  is  in  general  too  restrictive,  however.  One  must  allow 
for  the  presence  of  “work  bits,”  called  ancillce ,  that  make  extra  space  available  in  which  to  do  a 
computation.  In  that  case,  in  order  to  exactly  compute  the  operator  U  we  extend  the  Hilbert 
space  in  which  C  acts  to  the  2n+m-dirnensional  space  Bn+m  spanned  by  computational  basis  states 
\x±,  ...,xn,ai,  where  again  xt ,  al  e  {0,1},  the  a,  serving  as  ancillas.  Then  we  say  that  C 

cleanly  computes  U  if,  for  any  x\,  ...,xn  and  y\,  ...,yn, 


(yi,-,2/n,0,  ...,01(71x1,  ...,xn,0,  ...,0) 


(2/1,  Vni  0,  •••,  0|(H  <8>/)|xi, ...,  xn,  0, ...,  0), 


where  I  is  the  identity  in  the  subspace  that  acts  on  the  ancillae,  and  the  number  of  0’s  in  each  state 
above  is  m.  That  is,  C  does  a  clean  computation  if  the  ancillas  begin  and  end  all  as  0’s.  We  assume 
all  of  our  circuits  perform  clean  computations.  This  is  a  reasonable  constraint,  since  only  then  is 
it  easy  to  compose  the  circuits. 

All  circuits  should  be  understood  to  be  elements  of  an  infinite  family  of  circuits  {C'n|n  >  0}, 
where  Cn  is  a  quantum  circuit  for  n  input  qubits  and  each  circuit  family  contains  a  fixed  finite  set 
of  gates. 

In  this  paper  we  deal  with  various  quantum  circuit  classes  which  are  defined  in  analogy  with 
the  classical  circuit  classes,  e.g.,  similar  to  NCfc,  QNCfc  is  defined  as  logfcn  depth,  polynomial  size 
circuits  containing  only  single  qubit  and  CNOT  gates.  We  list  some  of  the  quantum  circuit  classes 
below: 


Definition  2.1  Quantum  circuit  classes 
Quantum  analogues  of  NU 

QNCk:  consisting  of  single  qubit  and  CNOT  gates  (Toffoli  gates  with  n  =  l). 

QNC^f:  QNCfc  +  fanout  gates. 

Quantum  analogues  of  ACfc 

QACk:  consisting  of  single  qubit  and  Toffoli  gates. 

QAC^f:  consisting  of  single  qubit,  Toffoli  and  fan-out  gates. 

Quantum  analogues  of  ACCk 

QACCk:  QACCfc[g]  is  QACk  +  MODg  gates.  QACCk  =  U?QACCfc[g]. 

QACC:  QACC  is  defined  as  QACC0  and  QACC[g]  is  defined  as  QACC°[g]. 

Quantum  analogues  of  TCfc 

QTCfc:  QACk  +  arbitrary  fanin  threshold  gates. 

QTC^f:  QAC^,f  +  arbitrary  fanin  threshold  gates. 

It  should  be  emphasized  that  these  are  classes  of  unitary  operators.  There  are  various  ways  in 
which  they  can  be  used  to  define  classes  of  sets,  but  we  will  not  explore  that  here. 

This  is  an  unacceptably  large  array  of  complexity  class  definitions  (perhaps  even  gratuitous). 
Surprisingly  and  happily,  however,  most  of  them  are  either  the  same  or  are  very  close,  unlike  the 
corresponding  classical  classes. 

3  Upper  Bounds 

The  first  hint  that  something  different  is  going  on  in  quantum  circuits  is  in  the  intimate  relationship 
between  fanout  and  parity.  There  is  no  obvious  a  priori  relation  between  these  operators,  and  indeed 
we  wouldn’t  expect  there  to  be  any  on  the  basis  of  our  experience  with  classical  circuits.  But  as 
was  observed  by  Moore  [15],  F  is  conjugate  to  P  via  an  (n+  l)-fold  tensor  product  of  Hadamards 
applied  to  all  the  bits: 

p  =  jp&(n+l)  p  p®{n+l)  (2) 

This  is  a  consequence  of  the  well-known  fact  that  a  CNOT  gate,  conjugated  with  Hadamards,  flips 
the  input  and  target  bits  (see  Figure  1). 


Figure  1:  The  parity  and  fanout  gates  are  conjugates  of  each  other  by  a  layer  of  Hadamard  gates. 

It  is  immediate  from  this  that  QAC^f  =  QACCwf[2]  =  QACC[2],  Contrast  this  with  the  famous 
classical  result  of  Furst,  Saxe  and  Sipser  [11]  that  parity  is  not  in  AC0  (and  hence  ACC[2]  /  AC0). 


It  is  also  immediate  that  for  all  k.  QAC^f  =  QACC^f[2]  =  QACCi:[2].  Since  it  is  possible  to  fanout  n 
copies  in  log  n  depth  using  CNOT  (via  divide  and  conquer),  we  also  have  QACk  C  QACtf  C  QACfc+1 
(subsequent  results  in  this  survey  lead  to  similar  relationships  for  the  k  >  0  classes). 

The  QACC[g]  classes  for  q  ^  2  present  a  more  subtle  problem.  Recall  the  Razborov/Smolensky 
Theorem  [21,  24]  that  says  that  for  any  relatively  prime  q,p ,  ACC[g]  /  ACC[p],  and  hence  ACC[g]  %. 
AC0.  One  would  think,  in  accordance  with  this,  that  the  QACC[g]  classes  are  all  incomparable  with 
QACC[2],  In  fact  the  opposite  turns  out  to  be  true:  QACC[g]  =  QACC[2]  for  all  q. 


Figure  2:  If  a  group  of  gates  U\, . . .  ,Un  are  simultaneously  diagonalizable  ( Ui  =  VDiV ')  or  are 
the  same,  then  they  can  be  applied  in  parallel  using  fanout  gates.  This  holds  even  if  the  gates  are 
controlled  by  some  control  qubit. 

The  proof  that  QACC[g]  C  QACC[2]  uses  the  most  important  technique  to  date  in  this  area: 
parallelization,  which  was  first  observed  by  Moore  and  Nilsson  [16].  A  series  of  commuting  (for 
example,  identical)  unitary  operations  can  be  implemented  in  a  constant  number  of  layers  by 
diagonalizing  the  operators  and  using  fanout  to  apply  the  diagonal  operators  in  parallel.  The 
process  is  sketched  in  Figure  2.  In  this  case,  the  operations  we  want  to  parallelize  are  those  that 
increment  a  register  mod  q: 

Mq\x)  =  |(x  +  1)  mod  q), 

where  x  denotes  n  bits.  Applying  n  controlled  Mq  gates  in  series  (each  Mq  controlled  by  one  of  the 
Xi  s)  leaves  the  sum  of  the  bits  mod  q  in  a  register.  Once  this  is  done,  applying  a  Toffoli  gate  to 
the  register  yields  Mod9(aq, . . .  ,xn).  Thus,  via  parallelization,  we  can  do  MODg  in  constant  depth 
using  fanout. 

The  other  direction,  QACC[2]  C  QACC[g],  essentially  amounts  to  showing  that  there’s  nothing 
really  special  about  “2”  in  the  conjugacy  relation  between  fanout  and  parity.  This  relationship 
can  be  generalized  to  an  analogous  one  between  “fanout  of  digits  base  g”  and  “sum  of  the  inputs 
mod  q .”  This  requires  a  Fourier  transform  which  works  on  “quantum  digif1  (rather  than  qu bit) 
registers,  which  is  analogous  to  the  Hadanrard  gate.  By  a  result  of  Barenco  et  al.  [2],  such  a 
constant-dimensional  unitary  transformation  can  be  realized  in  constant  depth  with  just  CNOT 
and  single-qubit  gates.  With  considerable  added  circuitry  to  represent  the  digits  as  bits,  these 
techniques  result  in  the  following: 

Theorem  3.1  [13]  For  all  q,  QAC^,f  =  QACC[g]  =  QACC. 

The  biggest  surprise  is  yet  to  come.  Consider  the  class  QNC°.  Like  its  classical  counterpart,  it 
is  not  very  useful;  as  we  explain  later  on,  no  output  can  depend  on  all  the  inputs!  What  if  we  add 
fanout,  as  in  the  class  QNC^f?  Hpyer  and  Spalek  showed  that  with  high  accuracy,  one  can  compute 
threshold  functions  with  only  fanout  and  single-qubit  gates,  in  constant  depth! 


Theorem  3.2  [If]  Let  {Fn}  be  a  family  of  operators  in  QNC^f,  QAC^f,  or  QTC(}f.  Then  there 
is  a  family  of  operators  { Gn }  in  either  of  the  other  classes  that  approximates  {Fn}  with  two-sided 
polynomially  small  error. 

In  a  strong  sense,  therefore,  the  classes  QNC^,f,  QAC(}f,  QTC^,f  (to  say  nothing  of  QACC  and  all 
its  kindred)  are  equivalent  in  computational  power.  The  theorem  essentially  says  that  fanout  and 
single-qubit  operations  form  a  universal  set  of  quantum  gates. 

Proof  sketch:  The  proof  of  Theorem  3.2  is  centered  around  the  parallelization  method.  To 
give  the  clearest  exposition  possible,  we  find  it  advantageous  to  follow  (initially)  an  earlier  technique 
of  Spalek  [25] .  Following  exactly  his  line  of  reasoning,  we  sketch  how  to  simulate  an  exact  threshold 
gate,  in  which  the  boolean  function  /  in  the  definition  is  1  iff  Ya= 1  xi  =  C  using  this,  threshold 
gates  can  be  easily  constructed.  The  main  roadblock  in  generalizing  Theorem  3.1  to  Theorem  3.2 
is  that  the  proof  of  the  former  relies  heavily  on  the  fixed  dimensionality  of  the  mod  q  increment 
operator  which,  by  Barenco  et  al.,  can  be  simulated  in  constant  depth.  In  order  to  implement  an 
exact  gate,  we  need  to  compute  the  sum  of  the  inputs  not  mod  q,  but  mod  n,  where  n  is  the  number 
of  inputs  (technically,  mod  n  +  1).  Denoting  logn  as  k,  the  required  fc-bit  increment  operator  M 
is  defined  as, 

M\x)  =  |  (x  T  1)  mod  2k), 

where  x  denotes  a  /c-bit  number.  M  grows  exponentially  in  k  so  we  can  no  longer  rely  on  Barenco 
et  al.  The  way  around  this  problem  starts  with  a  simple  but  quite  interesting  observation,  namely 
that  M  is  diagonal  in  the  Fourier  basis.  This  is  the  basis  of  the  Hilbert  space  that  is  obtained  when 
we  perform  the  quantum  Fourier  transform, 

where  u  =  e2ni^k .  Specifically,  a  straightforward  computation  shows  that  M  =  Q^DQ,  where 
D  =  diag(l,  u,  u>2 ,  w3, . . .  ,a ;2  _1).  Although  D  is  a  “big”  operator,  it  nevertheless  can  be  written 
as  a  tensor  product  of  single-qubit  operators  |6)  1— >  iwb\b)  where  b  £  {0,1}.  Thus  if  x  is  a  A-bit 

string,  D\x)  =  uJ^i=iXi\x)  =  ®k=l(ojXi\ xf)). 


Figure  3:  Compute  X)”=i  x%  in  constant  depth  by  using  Q,  fanout  gates  and  controlled-D  gates. 
The  n  D- gates  are  parallelized  here  to  reduce  the  depth. 

For  computing  the  sum  of  the  inputs,  we  really  need  the  M  (and  hence  D )  operator  to  be 
controlled  by  all  the  respective  inputs.  In  fact,  using  known  properties  of  U\ .  the  controlled  D 


operator  can  be  implemented  in  constant  depth  with  single-qubit  and  CNOT  gates.  The  next  step 
is  to  use  this  to  compute  Y^=\  xii  and  indeed  this  can  be  done  with  the  aid  of  Q  as  shown  in  figure  3. 
In  the  working  register  of  logn  bits,  the  output  is  exactly  the  sum  of  the  xfs.  The  problem  is,  we 
don’t  know  if  Q  can  be  implemented  in  constant  depth  (we  will  see  that  it  can,  but  we  must  avoid 
circular  reasoning).  Spalek’s  inspired  fix  to  this  was  simply  to  replace  the  Q  and  Q t  with  layers  of 
Hadamards,  and  applying  a  D~f  operator  to  another  set  of  fanned  out  inputs.  The  output  is  no 
longer  the  sum  of  the  xf s,  but  it  is  enough  for  our  purposes:  The  bits  in  the  output  are  all  0  if  and 
only  */E"=i  xi  ~  t  =  0.  With  the  aid  of  a  Toffoli  gate,  we  can  now  determine  if  Ya= i  xi  =  A  and  it 
is  done  in  constant  depth. 

Thus  far  we  can  already  see  a  quite  interesting  result.  If  the  definition  of  QAC^f  allowed  an 
unbounded  number  of  single-qubit  gates  in  a  circuit  family  (in  this  case,  the  operations  1 6)  i — >  1 6) , 
which  depend  on  n),  we  could  conclude  from  the  above  argument  that  QAC^,f  =  QTC^,f  exactly. 
However,  the  definition  stipulates  that  only  a  finite  number  of  such  gates  are  allowed  in  any  circuit 
family.  Hpyer  and  Spalek  [14]  showed  how  to  overcome  this  problem.  Any  single-qubit  gate  can 
be  approximated  to  arbitrary  accuracy  by  a  fixed  number  of  gates,  in  constant  depth.  The  cost 
is  a  polynomially  larger  circuit  and  polynomially  small  (l/nc)  two-sided  error.  This  is  enough  to 
establish  the  part  of  Theorem  3.2  regarding  the  equivalence  of  QAC^f  and  QTC^,f. 

By  further  applications  of  single-qubit  gates  (or  their  approximations)  and  fanouts,  Hpyer  and 
Spalek  were  able  to  reduce  the  computation  of  the  final  Toffoli  gate  to  a  measurement  of  a  quantum 
register,  whose  outcome  agrees  with  OR  with  small  error.  Dispensing  with  the  Toffoli  gate  in  this 
way  gives  the  theorem  as  stated  and  concludes  our  proof  sketch. 

There  are  a  number  of  immediate  corollaries  that  follow  from  Theorem  3.2  when  combined  with 
some  results  on  classical  threshold  circuits  [23].  For  example,  iterated  multiplication,  division,  and 
sorting  of  n  integers  can  be  done  with  polynomial-size  TC°-type  circuits.  Hence  these  operations 
can  also  be  approximated  in  QNC((f. 


3.1  Quantum  Fourier  Transform 

We  saw  the  role  of  the  quantum  Fourier  transform  (QFT)  in  the  construction  above.  The  QFT  is 
one  of  the  most  widely  used  unitary  transformations  in  quantum  circuits.  It  is  one  of  the  key  com¬ 
ponents  of  many  quantum  algorithms,  like  Shor’s  [22]  quantum  algorithm  for  factoring.  An  efficient 
implementation  of  the  QFT  will  improve  a  wide  variety  of  quantum  circuits  and  algorithms.  Before 
we  discuss  low  depth  circuits  for  the  QFT,  it  is  interesting  to  compare  it  to  its  classical  counter¬ 
part,  the  discrete  Fourier  Transform  (DFT).  The  m-dimensional  DFT  maps  (ao, . . . ,  am-i)  £  Cm 
to  (bo, ,  bm- 1)  6  Cm  where, 

m—  1 

bx=J2  e{2ni/m)xyay 

y= o 

The  fast  Fourier  transform  algorithm  can  compute  the  DFT  in  0(m  log  m)  operations. 

The  m-dimensional  quantum  Fourier  Transform  can  be  seen  as  a  unitary  operation  performing 
the  DFT  on  the  amplitudes  of  a  logm-qubit  state,  mapping  J2rx=o  ax\x)  to  X) ™=o  Px\x)  where, 


Px 


m—  1 

e(27r i/m)xya 


y= 0 


The  rest  of  this  section  assumes  m  =  2n  and  uses  u  =  e27™/2" .  Coppersmith  [6]  showed  how  to 


compute  the  QFT  in  0(n 2)  size  and  depth.5  The  bounds  have  been  reduced  further  to  sub-quadratic 
size  and  linear  depth. 

One  approach  to  computing  the  QFT  in  constant  depth  can  be  obtained  by  inspecting  the  state 
of  each  qubit  in  the  transformed  state. 


^x)  =  Q\x ) 
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2n/2 


2n— 1 

E  “x,\v) 


y= o 
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_  -I  cyn—k  i  |f|\  |  r27rix/2k  \-t\ 

The  “rotations”  ^6= o6*'  x°\b)  =  — - ,  controlled  by  x,  can  easily  be  implemented 

in  parallel  using  fanout  gates  as  described  before.  This  suffices  to  obtain  the  circuit  for  |x}|0) 
|a;)|\IU}-  This  is  followed  by  the  relatively  harder  step  |x)|\I'x}  i— ►  |0)|^x)  to  complete  the  QFT 
transformation  |x}|0)  i— ►  |x)|^x)  i— >  |'hx}|0).  Cleve  and  Watrous  [4]  showed  how  to  compute  the 
second  step  (with  high  accuracy)  using  log-depth  circuits  with  just  CNOT  gates.  Hpyer  and  Spalek 
further  adapted  Cleve  and  Watrous’  technique  to  show  that  QFT  can  be  approximated  with  small 
error  probability  in  QNC^,f. 

The  transformation  |x)|\hx)  i— >  |0)|4'x)  is  computed  by  first  estimating  |'F.r)|0)  i— >  |'Fa;)|x)  using 
quantum  Fourier  phase  estimation  and  then  XORing  \x)  to  |x).  If  the  estimate  is  accurate  enough, 
then  x  =  x,  so  we  end  up  with  the  desired  state. 

The  quantum  Fourier  phase  estimation  requires  several  copies  of  IH/j;),  which  can  be  obtained 
by  using  the  reversible  addition  gate:  |xi) . . .  \xt)  i— ►  |xi) . . .  \xt~i)\Yl  xi  mod  2").  Addition  mod¬ 
ulo  2n  can  be  computed  using  threshold  gates,  which  as  shown  before  can  be  approximated  in 
QNC^f.  Using  the  inverse  of  the  reversible  addition  gate,  we  can  approximate  in  constant  depth 

|\IU)|0)  . . .  |0)  i-»  |4'x)|4'a;) . . .  |^x).  Denoting  ^+e  J — ^  by  \px/2k),  this  gives  us  multiple  copies 
of  \px/2k)  f°r  each  k  =  1  ■  ■  ■  n. 

The  quantum  Fourier  phase  estimation  part  of  the  circuit  measures  the  copies  of  \px/2k)  to 
approximately  determine  all  bits  of  x.  Measurement  of  |px/2fc)  =  w^dO)  +  e2m(°-xk-i—xo)^  jn  t}ie 

basis  {^(|0)+e2'i  i|l)).J3(|0)+e2"i|l))}aiKl{J5(|0)  +  e2'‘-»|l)),^(|0)+e2’i!|l))}  allows 
us  to  estimate  x^.  Using  0(\ogn/e )  copies  of  \px/2k)  gives  us  an  estimate  x  which  is  e  close  to 
x.  After  computing  |x  ©  x)|\I',r)|a:),  the  ancillm  \x)  are  returned  to  their  initial  states  to  perform 
a  clean  computation.  All  the  measurements  and  the  estimation  can  be  done  in  parallel  and  the 
measurements  can  be  deferred  till  the  end.  That  gives  us  a  constant  depth  approximation  for 
computing  the  QFT. 


Theorem  3.3  [14]  The  QFT  transformation  can  be  approximated  with  polynomially  small  error 
by  a  quantum  circuit  of  constant  depth  using  CNOT  and  a  fixed  family  of  single- qubit  gates. 


4  Lower  Bounds 

In  the  last  section  we  saw  the  surprising  power  provided  by  allowing  fanout  gates  in  constant 
depth  quantum  circuits.  We  now  turn  to  the  problem  of  whether,  in  the  presence  of  single  qubit 

5Though  this  looks  like  an  exponential  improvement  over  the  classical  case,  note  that  unlike  the  DFT,  the  QFT 
does  not  explicitly  compute  (do,  •  •  •)•  Intuitively,  the  difference  between  DFT  versus  QFT  is  analogous  to  computing 
a  probability  distribution  versus  sampling  from  the  distribution  [4]. 


gates,  these  gates  are  strictly  more  powerful  than  (unbounded)  Toffoli  gates  and  are  necessary  for 
simulating  stronger  classes. 

We  seek  to  fill  a  gap  in  our  understanding  of  the  relative  power  of  Toffoli  and  fanout  gates 
in  quantum  circuits.  Roughly  speaking,  we  know  that  given  fanout,  we  can  do  fanin ,  where  by 
fanin,  we  mean  the  quantum  gates,  for  example  the  Toffoli  gate,  in  which  one  output  qubit  of  the 
gate  depends  on  the  values  of  (unboundedly)  many  inputs  qubits.  But  can  fanout  do  more?  Are 
generalized  Toffoli  gates  and  fanout  gates  equivalent  in  power,  up  to  polynomial  size  and  constant 
depth;  i.e. ,  are  QAC°  and  QAC^f  equal?  We  believe  they  are  not,  and  thus  that  fanout  gates  are 
strictly  more  powerful. 

In  answering  this  question  we  find  it  necessary  to  grapple  with  another  likely  limitation  of  real 
quantum  computers.  It  is  evident  that  they  will  be  limited  not  only  in  their  run-time  duration 
but  also  by  the  number  of  qubits  used  in  the  computation,  due  to  the  difficulty  in  controlling  the 
interactions  of  multiple  qubits.  It  will  be  necessary  to  identify  computations  which  use  as  few 
ancillas  as  possible.  Thus  we  consider  here  the  number  of  ancillas  used  by  a  circuit  as  an  additional 
computational  resource,  and  investigate  cases  where  this  resource  is  limited. 

The  main  result  of  this  section  is  that  one  cannot  compute  parity  (and  hence  fanout)  with  QAC° 
circuits  using  a  constant  number  of  ancillas.  This  is  the  first  hard  evidence  that  QAC°  and  QAC(J,f 
may  be  different,  and  that  fanout  may  be  necessary  for  all  the  upper  bound  results  mentioned  in 
Section  3  (it  certainly  is  if  we  limit  our  computations  to  only  constantly  many  ancillm).  The  issue 
of  the  necessity  of  ancillm  in  quantum  computations  is  a  murky  one.  It  is  generally  accepted  that 
a  limited  number  (polynomially  many  relative  to  the  number  of  inputs)  are  allowed.  This  seems 
reasonable  as  it  allows  polynomial  extra  space  in  which  to  carry  out  a  computation.  However,  it 
is  possible  to  approximate  any  unitary  operator  with  a  small  set  of  universal  gates  without  ancillae 
(although  one  needs  circuits  of  exponential  depth  and  size  in  order  to  do  so  [17]).  Furthermore, 
to  our  knowledge,  no  systematic  investigation  into  the  absolute  necessity  of  ancillm  for  efficient 
quantum  computation  has  been  done. 

Our  main  lower  bound  theorem  states: 

Theorem  4.1  [8]  Let  C  be  a  circuit  of  depth  d  consisting  of  single- qubit  gates  and  Toffoli  gates, 
and  using  0  ancillce.  Then,  if  d  £  o(log(n)),  C  cannot  compute  the  fanout  operation. 

The  theorem  can  be  generalized  to  circuits  with  a  limited  number  of  ancillas.  Namely,  in  the 
case  of  a  non-constant  number  a  of  ancillae  and  n  input  qubits,  we  have  a  tradeoff  between  a  and 
the  required  depth,  that  results  in  a  non-constant  lower  bound  for  fanout  when  a  =  n1-0^1). 

It  is  not  hard  to  see,  via  a  divide  and  conquer  technique  using  CNOT  gates,  that  one  can 
compute  parity  in  depth  21ogn  +  1.  We  conjecture  that  this  is  optimal  no  matter  what  a  is,  and 
regardless  of  allowed  size  of  Toffoli  gates.  However,  the  best  lower  bound  on  depth  we  can  obtain 
is  1.44  log  n  —  1  for  0  ancillas.  With  a  more  careful  analysis,  we  find  that  circuit  depth  lower  bound 
of  at  least  1.44  log  m  —  1  is  required  for  any  function  with  the  property  that,  for  any  input  string  x, 
there  is  a  set  of  m  bits  such  that  flipping  any  one  of  them  changes  f{x)  (The  integer  m  is  known 
as  the  sensitivity  of  the  function  /  [18]).  So  more  succinctly,  any  function  of  sensitivity  m  must 
have  at  least  1.44  log  m  —  1  depth. 

The  proof  of  the  lower  bound  is  quite  long  and  can  be  found  in  [8].  We  limit  ourselves  here  to 
a  brief  discussion  of  the  central  ideas  of  the  proof  and  its  most  interesting  aspects. 

We  first  consider  the  case  of  QNC°  circuits  when  the  quantum  circuit  contains  only  1  and  2 
qubit  gates.  The  intuition  behind  the  proof  for  this  case  of  the  main  theorem  seems  quite  obvious. 


Namely,  if  such  a  circuit  has  depth  d,  then  any  output  qubit  of  the  circuit  can  depend  on  at  most 
2d  input  qubits.  This  fact  is  obvious  for  classical  circuits,  by  a  simple  connectivity  argument.  One 
might  think  it  equally  easy  in  the  quantum  case,  but  the  picture  of  a  quantum  computation  as  a 
“classical”  circuit  can  be  deceiving.  It  is  therefore  important  to  verify  carefully  the  intuitive  fact 
that  a  quantum  circuit  must  connect  all  the  qubits  on  which  its  output  depends  to  the  qubit  we 
will  measure  for  the  output.  Furthermore,  the  technique  we  use  here  underscores  the  difficulties 
for  the  more  general  lower  bound  theorem  for  circuits  including  large  Toffoli  gates. 


Figure  4:  Decomposition  of  the  layers  of  the  QNC°  circuit  C. 

Let  C  =  L\  ■  ■  ■  Ld  consist  entirely  of  arbitrary  two-qubit  gates  and  single-qubit  gates.  (The 
extension  to  arbitrary,  but  fixed-size  gates  is  straightforward.)  Further  suppose  that  M  is  an 
observable  on  a  single  qubit  in  the  last  layer.  Let  L[  denote  the  gate  whose  output  M  is  measuring 
(Figure  4).  L[  could  be  a  two-qubit  or  a  single-qubit  gate.  In  either  case,  L\  =  L'x  <g)  R\,  where 
R\  is  the  tensor  product  of  all  the  other  gates  in  that  layer,  if  any.  If  we  only  had  this  one 
layer,  the  result  of  the  measurement  M  is  determined  by  the  expectation  value  of  the  operator 
( L\  (g)  i?i)^Af(L/1  (g)  R\).  Since  M  commutes  with  R±,  the  Ris  cancel,  and  the  only  gate  involved 
is  L\.  We  proceed  to  include  more  layers,  that  is,  decomposing  layer  i  by  writing  L*  =  < g)  Ri, 

where  L't  is  a  transformation  that  acts  on  some  subset  of  the  bits  involving  M,  and  Ri  acts  on  the 
rest.  A  similar  thing  happens  with  these  other  layers.  M  remains  sandwiched  between  some  of 
the  operators  L(,  but  the  Ri  s  cancel  and  by  induction  the  width  of  the  layers  L'-  that  remain  is  at 
most  2d. 

From  this  result  on  connectivity  of  quantum  circuits,  we  immediately  obtain, 

Theorem  4.2  [8]  Let  C  be  a  QNC°  circuit  on  n  inputs  and  depth  d  with  any  number  of  ancillas 
that  cleanly  computes  parity  exactly.  Then  d  >  log  n.  If  C  computes  fanout  in  the  same  way,  then 
d  >  logn  —  2. 

The  theorem  actually  applies  not  only  to  parity  but  to  any  function  whose  output  depends  on 
all  of  its  inputs.  The  proof  technique  can  also  be  used  to  establish  that  constant  depth  circuit 
families  with  gates  of  bounded  arity  are  not  capable  of  simulating  Toffoli  gates. 

We  now  turn  to  the  separation  of  circuits  with  (unbounded)  Toffoli  gate  and  fanout  gates.  To 
see  how  to  proceed,  it  is  useful  to  briefly  consider  classical  circuits  with  similar  constraints.  Suppose 
we  have  a  classical  circuit  with  NOT  gates  and  unbounded  fan-in  AND  and  OR  gates,  but  that 
we  do  not  allow  any  fanout.  Once  inputs  (or  outputs  of  other  gates)  are  used  in  either  an  AND  or 


an  OR  gate,  they  cannot  be  used  again.  It  is  obvious  that  if  such  a  circuit  has  constant  depth,  it 
cannot  compute  such  functions  as  parity.  The  AND  and  OR  gates  can  be  killed  off  by  specifying 
their  values  on  a  small  set  of  inputs,  resulting  in  a  constant  function,  while  parity  depends  on  all 
the  inputs. 

In  the  quantum  case,  it  appears  again  that  the  only  thing  to  do  is  to  attempt  to  “kill  off’  the 
large  Toffoli  gates.  However,  the  quantum  case  is  much  more  subtle  since  we  must  face  the  fact 
that  intermediate  states  are  a  superposition  of  computational  basis  states,  and  furthermore  that 
the  Toffoli  gates,  in  combination  with  the  single-qubit  gates,  may  cause  entanglement. 


i 


Figure  5:  The  sets  Ri  and  L[  for  each  layer  i.  A  Z  gate  involving  bits  in  both  sets  is  shown. 

Assume  we  have  a  circuit  C  composed  of  d  levels  Li,  L2  •  •  •  Lj.  The  circuit  C  transforms  the 
state  I'k)  to  L\  ■  ■  ■  Ld |’F).  We  assume  without  loss  of  generality  that  each  layer  Lj  is  a  tensor  product 
of  Z-gates6  and  single-qubit  gates.  Further  assume  without  loss  of  generality  that  a  specific  bit 
(say,  the  nth  bit)  of  C  serves  as  the  output  or  target  bit  (which  eventually  is  supposed  to  agree 
with  the  output  bit  of  a  parity  gate) . 

Our  method  is  to  work  backward  through  he  levels  of  C  starting  from  the  target  qubit  (refer  to 
Figure  5).  We  fix  the  target  bit  to  0.  We  prove  that  there  is  a  small  set  of  bits  in  layer  i  that  are 
involved  in  a  quantum  state  |\Fj)  that  serves  (when  used  as  input  at  that  layer)  to  keep  the  target 
bit  at  its  fixed  0  value.  Bits  that  are  not  involved  in  the  construction  of  the  state  in  the  previous 
layer  but  that  are  inputs  to  Z-gates  are  set  to  0  (to  kill  the  Z-gate;  this  set  of  bits  is  denoted  Ri 
in  Figure  5).  Bits  that  are  involved  (denoted  L\  in  Figure  5)  are  allowed  to  propagate  backwards 
through  the  layer,  which  can  result  in  entanglement.  The  number  of  bits  that  are  “committed  to 
0”  (the  size  of  L()  at  any  layer  at  most  doubles  as  we  work  backwards  through  the  layers.  Thus  we 
find  that  the  number  of  bits  involved  in  |\Iq)  at  the  ith  level  is  at  most  2*.  For  the  full  circuit  of 
depth  d,  there  thus  exists  a  state  |\kd)  involving  only  2d  bits  that,  when  applied  to  the  circuit,  will 
force  the  output  to  be  0.  Hence  if  there  are  more  than  2d  input  bits  to  the  circuit,  the  target  bit 
is  insensitive  to  some  inputs  and  the  circuit  cannot  compute  parity  exactly. 

The  above  sketch  works  as  well  for  circuits  with  limited  numbers  of  ancillas.  The  idea  is  simply 
to  fix  the  ancillas  just  as  we  fixed  the  target  bit,  and  construct  a  state  that  guarantees  they  are 
0  by  working  backwards.  However,  fixing  many  ancillas  (e.g.,  n  of  them)  could  force  us  to  use  all 

6Such  gates  flip  the  sign  of  \x\,  ...,xn)  iff  /\™=1Xi  =  1.  They  are  equivalent  to  Toffoli  gates  conjugated  with 
Hadamards  at  the  target.  They  are  more  convenient  to  use  here  as  they  do  not  have  a  preferred  target  bit. 


the  inputs,  and  the  proof  breaks  down.  Thus,  for  a  small  number  of  ancillae,  we  obtain  a  tradeoff 
between  the  depth  of  the  circuit  and  the  number  of  ancillae  it  may  have.  Furthermore,  a  more 
careful  analysis  shows  that  bits  are  committed  to  0  in  approximately  alternate  layers,  which  leads 
to  the  factor  of  the  golden  ratio  in  the  theorem. 

Theorem  4.3  [8]  Let  C  be  a  depth  d  circuit  of  n  inputs,  consisting  of  single-qubit  gates  and  Z- 
gates,  and  using  a  ancilhe,  with  n  >  (a  +  1  )(fd+1 .  If  d  <  —  1  =  1.44  log  n  —  1,  then  C  cannot 

compute  P,  the  parity  gate  with  n  —  1  inputs  and  one  target. 

The  circuits  described  above  required  the  ancillae  to  be  initialized  to  0.  Also  all  the  circuits  were 
doing  clean  computation ,  i.e. ,  the  ancillae  were  0  at  the  end.  Clean  computation  ensures  that  circuits 
can  be  easily  composed  and  ancillae  reused.  There  could  be  circuits  which  require  the  ancillae  to 
be  initialized  to  a  specific  state.  A  clean  circuit  would  return  the  ancillae  to  the  beginning  state  at 
the  end. 

There  are  some  cases  where  we  might  want  the  circuit  to  work  with  any  value  in  the  ancillae. 
Initializing  the  ancillae  to  a  specific  state  might  be  difficult.  A  standard  technique  used  for  clean 
computation  is  to  copy  the  result  to  the  output  qubit  and  then  apply  the  reverse  computation  to 
return  the  ancillae  to  their  initial  state.  For  some  circuits  this  technique  does  not  work.  Parker  and 
Plenio  [20]  show  that  arbitrary  initial  states  can  be  used  in  the  quantum  part  of  Shor’s  factoring 
circuit.  Fang  et  al.  [8]  define  robust  computation  if  the  circuit  works  with  any  initial  state  of  the 
ancillae  but  returns  the  ancillae  to  the  initial  state  in  the  end.  This  puts  a  stronger  constraint  on 
the  circuit.  It  can  be  shown  that  any  circuit  needs  logn  depth  to  robustly  compute  parity  using 
only  single-qubit  gates  and  Toffoli  gate,  regardless  of  the  number  of  ancillae  used. 

5  Open  Problems 

1.  Is  QAC°  properly  contained  in  QAC^f?  Stated  in  more  detail,  is  it  not  the  case  that  a  fanout 
gate  can  be  computed  in  constant  depth  with  polynomially  many  generalized  Toffoli  gates, 
single-qubit  gates,  and  polynomially  many  ancillae? 

2.  A  similar  question  applies  to  QTC°  versus  QTC^f.  It  is  unknown  if  threshold  and  fanout 
gates  are  equally  powerful. 

3.  Can  QTC°  or  QTC^f  be  simulated  exactly  by  QACC  circuits,  using  a  fixed  number  of  single¬ 
qubit  gates? 

4.  We  have  seen  three  different  classes  of  constant  depth  quantum  circuits  namely,  QNC°,  QAC° 
and  QAC^f.  (QNC°  is  provably  different  from  QAC°  and  QAC^,f,  and  the  results  of  Section 
4  give  evidence  that  QAC°  and  QAC^f  are  different  as  well.)  The  last  class,  QAC^f,  is  quite 
general  and  its  computational  power  does  not  change,  even  if  we  add  mod  gates  and  threshold 
gates.  Can  every  class  of  constant  depth  quantum  circuits  composed  of  single  qubit  gates  and 
some  finite  sets  of  other  gates  be  simulated  by  one  of  the  classes  QNC°,  QAC°  or  QAC^f? 

5.  We  sketched  how  a  threshold  gate  can  be  approximated  by  fanout  and  single-qubit  gates  in 
constant  depth  with  polynomially  small  error.  Can  the  error  be  made  exponentially  small 
preserving  constant  depth? 


6.  The  ability  to  compute  either  parity  or  fanout  in  constant  depth  (provided  we  have  Hadamard 
and  CNOT  gates)  is  equivalent  to  the  ability  to  create  a  “cat  state,”  of  the  form  75<|00-  0>  + 
1 1 1 . . .  1) )  (with  n  l’s  and  0’s),  in  constant  depth  [8,  15].  Assuming  the  answer  to  the  question 
posed  in  (1)  is  affirmative,  what  insight  does  this  give  us  into  the  nature  of  entanglement? 
In  particular,  does  a  useful  entanglement  measure  emerge  (in  terms  of  the  size  and  depth 
requirements  for  producing  a  state)?  Alternatively,  can  existing  entanglement  measures  (see, 
for  example,  [5])  be  used  to  answer  the  question  posed  in  (1)? 

7.  A  significant  roadblock  to  lower  bounds  in  this  area  is  the  presence  of  ancillas.  The  existing 
results  give  trade-offs  between  circuit  size  and  the  number  of  ancillas.  What  techniques  can  be 
developed  to  obtain  stronger  trade-offs  of  this  type  either  in  this  context  or  in  more  general 
settings? 

8.  There  are  further  interesting  questions  concerning  the  power  of  ancillas  in  quantum  compu¬ 
tation  which  have  never  been  fully  explored.  For  example,  does  there  exist  a  function  which 
needs  more  than  linearly  many  ancillas  when  computed  by  a  constant  depth  quantum  circuit 
family?  More  specifically,  is  computing  parity  using  Toffoli  and  single  qubit  gates  possible 
using  linearly  many  ancillae?  Many  other  variants  of  this  question  arise  easily  and  have  no 
ready  answer. 
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